Finding Maximal Repetitions in a Word in Linear Time

نویسندگان

  • Roman Kolpakov
  • Gregory Kucherov
چکیده

A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in , that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in . We first prove a combinatorial result asserting that the sum of exponents of all maximal repetitions of a word of length is bounded by a linear function in . This implies, in particular, that there is only a linear number of maximal repetitions in a word. This allows us to construct a linear-time algorithm for finding all maximal repetitions. Some consequences and applications of these results are discussed, as well as related works.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Sum of Exponents of Maximal Repetitions in Standard Sturmian Words

A maximal repetition is a non-extendable (with the same period) periodic segment in a string, in which the period repeats at least twice. In this paper we study problems related to the structure of maximal repetitions in standard Sturmian words and present the formulas for the sum of their exponents. Moreover, we show how to compute the sum of exponents of maximal repetitions in any standard St...

متن کامل

Maximal repetitions and Application to DNA sequences

In this paper we describe an implementation of Main-Kolpakov-Kucherov algorithm [9] of linear-time search for maximal repetitions in sequences. We first present a theoretical background and sketch main components of the method. We also discuss how the method can be generalized to finding approximate repetitions. Then we discuss implementation decisions and present test examples of running the p...

متن کامل

On maximal repetitions of arbitrary exponent

The first two authors have shown [KK99, KK00] that the sum the exponent (and thus the number) of maximal repetitions of exponent at least 2 (also called runs) is linear in the length of the word. The exponent 2 in the definition of a run may seem arbitrary. In this paper, we consider maximal repetitions of exponent strictly greater than 1.

متن کامل

On primary and secondary repetitions in words

Combinatorial properties of maximal repetitions (runs) in formal words are studied. We classify all maximal repetitions in a word as primary and secondary where the set of all primary repetitions determines all the other repetitons in the word. Essential combinatorial properties of primary repetitions are established.

متن کامل

Computing Runs on a General Alphabet

We describe a RAM algorithm computing all runs (=maximal repetitions) of a given string of length n over a general ordered alphabet in O(n log 2 3 n) time and linear space. Our algorithm outperforms all known solutions working in Θ(n log σ) time provided σ = n, where σ is the number of distinct letters in the input string. We conjecture that there exists a linear time RAM algorithm finding all ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999